AD-A251  920 


),  1992,  BOSTON,  MASSACHUSETTS 


BRANCH  RECOVERY  WITH  COMPILER- ASSISTED 
MULTIPLE  INSTRUCTION  RETRY 


N.  J.  Ale  wine*  S.-K.  Chen,  C.-C.  Lit  W.  K.  Fuchs,  W.-M.  Hwu 


Coordinated  Science  Laboratory 
University  of  Illinois  at  Urbana-Champaign 


DTIC 

ELECT E 
1992 


Abstract 

In  processing  systems  where  rapid  recovery  from 
transient  faults  is  important,  schemes  for  multiple  in¬ 
struction  rollback  recovery  may  be  appropriate.  Mul¬ 
tiple  instruction  retry  has  been  implemented  in  hard¬ 
ware  by  researchers  and  also  in  mainframe  computers. 

This  paper  extends  compiler-assisted  instruction 
retry  to  a  broad  class  of  code  execution  failures  [1]. 
Five  benchmarks  were  used  to  measure  the  perfor¬ 
mance  penalty  of  hazard  resolution.  Results  indicate 
that  the  enhanced  pure  software  approach  can  pro¬ 
duce  performance  penalties  consistent  with  existing 
hardware  techniques.  A  combined  compiler/hardware 
resolution  strategy  is  also  described  and  evaluated. 
Experimental  results  indicate  a  lower  performance 
penalty  than  with  either  a  totally  hardware  or  totally 
software  approach. 

1  Introduction 

Checkpointing  is  a  well  understood  method  for  im¬ 
plementing  rollback  recovery  when  system  errors  occur 
[2-4J.  In  case  of  a  detected  fault,  the  system  is  rolled 
back  to  a  previous  checkpoint  containing  a  consistent 
state  of  the  system  [5].  Full  checkpointing  may  per¬ 
mit  long  error  detection  latency  at  the  expense  of  long 
recovery  times. 

When  transient  processor  errors  occur,  multiple  in¬ 
struction  retry  can  be  an  effective  alternative  to  full 
checkpointing  and  roflback  recovery  [6-8].  Multiple 
instruction  retry  within  a  sliding  window  of  a  few  in¬ 
structions  [6],  or  re-execution  of  a  few  cycles  [8],  can  be 

'Currently  on  Resident  Study  leave  from  IBM,  Boca  Raton 
FI. 

•Symbol  Technologies  Inc.,  Bohemia,  NY. 

'This  research  was  supported  in  part  by  the  National  Aero¬ 
nautics  and  Space  Administration  (NASA)  under  grant  NASA 
NAG  1-613,  in  cooperation  with  the  Illinois  Computer  Labora¬ 
tory  for  Aerospace  Systems  and  Software  (ICLASS),  and  in  part 
by  the  Department  of  the  Navy  and  managed  by  the  Office  of 
the  Chief  of  Naval  Research  under  Contract  N00014-91-J-1283. 
This  paper  has  been  cleared  through  author  affiliations. 

This  document  has  been  approved  j 
for  public  release  and  sale;  its  j 
distribution  fs  unlimited. 


implemented  in  parallel  with  concurrent  error  detec¬ 
tion  for  rapid  recovery  from  transient  processor  errors. 

The  issues  associated  with  instruction  retry  are  sim¬ 
ilar  to  those  with  exception  handling  in  out-of-order 
instruction  execution.  If  an  instruction  is  to  write  to 
a  register  and  N  is  the  maximumerror  (or  exception) 
detection  latency,  two  copies  of  the  data  must  be  main¬ 
tained  for  N  cycles.  Hardware  schemes  such  as  reorder 
buffers,  history  buffers,  future  files  [9],  and  micro- 
rollback  [6]  differ  in  where  the  updated  and  old  values 
reside,  circuit  complexity,  and  rollback  efficiency. 

A  compiler-assisted  approach  to  implementing  mul¬ 
tiple  instruction  retry  has  recently  been  developed 
by  the  authors  [1].  In  this  technique,  a  series  of 
compiler  transformations  are  used  to  eliminate  anti¬ 
dependencies  of  length  <  N.  Our  work  was  inspired 
by  the  hardware-based  micro-rollback  design  of  Tamir 
and  Tremblay  [6].  Our  software  approach  produces 
a  performance  impact  consistent  with  hardware-based 
techniques  [6]  and  has  the  added  benefit  of  making  N 
a  compile- time  parameter. 

This  paper  extends  compiler-assisted  multiple  in¬ 
struction  retry  to  include  a  broad  class  of  code  exe¬ 
cution  failures.  The  error  model  is  expanded  to  al¬ 
low  any  legal  path  in  the  control  flow  graph  (CFG) 
thus  allowing  branch  recovery.  Similar  compiler  tech¬ 
niques  to  those  we  previously  developed  [1]  are  shown 
to  be  effective  in  resolving  the  hazards.  Finally,  a  com¬ 
bined  compiler/hardware  scheme  is  introduced  which 
reduces  code  growth,  compilation  time,  and  perfor¬ 
mance  impact. 

The  remainder  of  the  paper  is  organized  as  follows: 
Section  2  describes  the  error  model  and  classifies  haz¬ 
ards.  Section  3  describes  the  compiler  techniques  for 
resolving  the  hazards.  Section  4  introduces  a  simple 
hardware  scheme  to  resolve  some  of  the  hazards.  Sec¬ 
tion  5  presents  performance  results  using  the  compiler- 
only  scheme  and  also  the  hardware-assisted  compiler 
scheme. 
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2  Error  Model  and  Hazard  Classifica¬ 
tion 

The  model  of  targeted  errors  is  summarized  as  fol¬ 
lows.  First,  the  maximum  error  detection  latency  is 
N  instructions.  Second,  memory  and  I/O  have  de¬ 
layed  write  buffers  and  can  rollback  N  cycles.  Third, 
the  states  of  the  program  counter  are  preserved  by  an 
external  recording  device  or  by  shadow  registers  [6]. 
Finally,  the  CPU  state  can  be  restored  by  loading  the 
correct  contents  of  the  register  file  and  the  program 
counter. 

In  addition  to  the  above,  any  error  which  does  not 
manifest  itself  as  an  illegal  path  in  the  CFG,  is  also 
allowed  provided  that  the  register  file  contents  do  not 
spontaneously  change  and  data  is  not  written  to  an 
incorrect  register  location. 

Errors  targeted  for  recovery  via  multiple  instruction 
retry  are  summarized  as  follows: 

1.  CPU  errors  such  as  those  caused  by  an  ALU. 

2.  Incorrect  values  read  from  I/O,  memory,  the  reg¬ 
ister  file,  or  external  functional  units  such  as  the 
floating  point  unit. 

3.  Correct/incorrect  values  read  from  incorrect  lo¬ 
cations  within  the  I/O,  memory,  or  register  file. 

4.  Incorrect  branch  decisions  resulting  from  a  per¬ 
missible  error. 

The  code  can  be  represented  as  a  CFG,  G(V,E), 
where  V  is  the  set  of  nodes  denoting  instructions  and 
E  the  set  of  edges  denoting  flow  information.  If  there 
is  a  direct  control  flow  from  instruction  i,  denoted  7, , 
to  lj ,  where  U  €  V  and  Ij  €  V,  then  there  is  an  edge 
di,Ij)€E. 

The  hazard  set  H  of  the  error  model  is  defined  as 
the  set  of  pseudo  or  machine  registers  whose  values  are 
inconsistent  during  different  executions  of  the  same 
instruction  sequence  due  to  retry. 

Hazards  can  be  of  two  types;  those  that  appear  as 
anti-dependencies  [10]  of  length  <  N  in  G(Vr,  E ),  and 
those  that  appear  at  branch  boundaries.  Figure  1  il¬ 
lustrates  both  types  hazards.  If  an  error  occurring 
prior  to  instruction  Ij  is  detected  after  instruction  /,• 
and  a  rollback  is  attempted,  an  incorrect  value  may 
be  contained  in  register  x.  The  first  type  of  hazard 
occurs  if,  after  rollback,  x  is  used  in  an  instruction 
along  the  original  path  (e.g. ,  Ij).  The  second  type 
of  hazard  occurs  if  x  is  used  in  an  instruction  along  a 
new  path  (e.g.,  /*).  This  can  happen  if  the  error  caus¬ 
ing  the  rollback  results  in  an  incorrect  branch  decision 
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Figure  T.  On-path  and  Branch  Hazards. 

during  execution  of  the  original  instruction  sequence. 
Hazards  of  the  first  type  will  be  referred  to  as  on-path 
hazards.  Hazards  of  the  second  type  will  be  referred 
to  as  branch  hazards. 

3  Compiler  Based  Hazard  Resolution 

Our  previous  techniques  resolved  on-path  hazards 
in  four  phases  [1],  Phase  1  resolved  pseudo  register 
hazards,  phase  2  resolved  machine  register  hazards, 
phase  3  resolved  inter-procedural  register  hazards,  and 
phase  4  used  nop  insertion  to  resolve  the  remaining 
hazards.  This  section  describes  compiler  techniques 
for  resolving  branch  hazards. 

3.1  Pseudo  Registers 

The  on-path  hazard  of  Figure  I  can  be  resolved 
by  renaming  the  definition  register  in  /,•  from  x  to  y. 
Node  splitting  and  loop  expansion  are  used  to  break  all 
data  dependencies  which  require  the  use  register  in  Ij 
to  be  renamed  as  a  result  of  renaming  x  [1, 10].  Loop 
protection  is  used  to  maintain  loop  integrity  during 
node  splitting  and  loop  expansion.  Renaming  is  also 
effective  in  resolving  branch  hazards.  The  next  step 
is  to  see  how  node  splitting,  loop  expansion  and  loop 
protection  apply  to  branch  hazards. 

Figures  2(a)  and  2(b)  show  a  typical  data  depen¬ 
dence  (requiring  node  splitting)  and  the  node  splitting 
technique  respectively.  In  Figure  2(a),  renaming  x  in 
/,  to  y  will  ultimately  require  the  renaming  of  the 
use  register  x  in  Ij  to  y  since  multiple  definitions  of 
x  reach  /*.  To  break  this  dependence,  the  following 
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Figure  2:  Node  Splitting. 


all  assigned 
tor*  ^ 


!  \ 

I  \ 


\i_ 


h* — rollback 


1 


jJ — IT  V-t — ii  j 

-  2  I  x  - _ I1/  i  Dependence  Graph 


error 

detected 

(a) 


add  arcs 


node  splitting  criterion  is  used:  If  multiple  definitions 
of  x  reach  /*  and  x  is  in  the  live.in  set  of  /*,  Ik  will 
be  split  into  two  identical  nodes.  This  “unzipping”  is 
shown  in  Figure  2(b).  Loop  protection  assures  that 
no  loop  header  is  split  [1], 

Node  splitting  and  loop  protection  operate  on  the 
data  flow  parameters  of  the  CFG.  Since  these  param¬ 
eters  are  unaffected  by  the  type  of  hazard  considered, 
both  techniques  work  equally  well  for  branch  hazards. 
This  is  not  the  case  for  the  loop  expansion  transfor¬ 
mation. 

Loop  expansion  is  used  for  resolving  a  hazard  which 
traverses  a  loop  back  edge.  We  have  experimentally 
observed  a  low  rate  of  occurrence  for  branch  hazards 
traversing  loop  back  edges.  We  therefore  resolve  all 
such  branch  hazards  in  the  nop  insertion  phase. 

3.2  Machine  Registers 

Once  hazards  have  been  eliminated  through  renam¬ 
ing  they  can  reappear  as  the  physical  registers  are 
assigned.  Figure  3  shows  the  elimination  of  on-path 
and  branch  hazards  by  adding  arcs  to  the  dependency 
graph  used  for  register  allocation. 

3.3  Inter- Procedural  Hazards 

Inter-procedural  register  saving  conventions  can 
create  immediate  on-path  hazards  [lj.  Branch  hazards 
are  not  immediately  created  at  procedural  boundaries 
and  therefore  no  additional  action  is  taken. 
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Figure  3:  Machine  Register  Hazards. 

3.4  Nop  Insertion 

Spill  code  as  a  result  of  register  allocation  can  cre¬ 
ate  on-path  and  branch  hazards.  A  similar  problem 
exists  with  the  stack  pointer  and  frame  pointer.  Some 
branch  hazards  may  also  remain  that  were  unresolved 
with  the  loop  expansion  transformation.  On-path  haz¬ 
ards  are  resolved  by  inserting  nop  instructions  directly 
before  the  hazard  instruction  so  that  the  rollback  will 
be  below  the  last  use  of  the  hazard  register.  This 
technique  does  not  work  for  branch  hazards  since  the 
distance  between  the  definition  and  the  use  instruc¬ 
tions  are  not  relevant.  Instead,  nop  insertion  is  used 
to  increase  the  distance  from  the  hazard  instruction  to 
its  nearest  predecessor  branch.  In  this  case  a  rollback 
will  be  below  the  branch. 

4  Hardware  Assisted  Hazard  Resolu¬ 
tion 

It  was  shown  in  Section  3  that  both  on-path  and 
branch  hazards  can  be  resolved  using  compiler  tech¬ 
niques.  However,  based  on  unique  characteristics  of 
the  two  hazard  types,  we  can  design  an  efficient  com¬ 
bined  compiler/hardware  resolution  technique. 


The  on-path  hazard  shown  in  Figure  1  is  such  that 
after  rollback  of  N  instructions,  the  original  N  instruc¬ 
tions  are  re-executed  leading  back  to  U.  Recovery  in 
the  presence  of  on-path  hazards  can  be  aided  by  main¬ 
taining  a  register  read  history  of  depth  N  in  hardware. 

Examining  the  branch  hazard  in  Figure  1,  it  is  seen 
that  at  the  time  x  is  defined  in  h  it  is  not  possible  to 
determine  dynamically  whether  it  is  a  hazard.  After 
retry,  control  flow  may  proceed  down  a  path  that  uses 
x  prior  to  redefinition  (in  this  case  it  is  a  hazard)  or  it 
may  proceed  down  a  path  where  x  is  redefined  before 
being  used  (in  this  case  there  is  no  hazard).  In  contrast 
to  on-path  hazards,  branch  hazards  are  a  function  of 
possible  future  paths  (i.e.,  after  rollback).  A  delayed 
write  or  history  buffer  could  be  used  to  resolve  such 
hazards,  however  these  schemes  conservatively  resolve 
on-path  hazards  as  well.  An  alternative  is  to  use  com¬ 
piler  transformations  to  resolve  branch  hazards  and 
hardware  to  assist  in  on-path  hazard  resolution. 

4.1  Hardware  Assist 

Figure  4  shows  a  hardware  scheme  to  resolve  on- 
path  hazards.  In  contrast  to  a  write  buffer  [6]  which 


Figure  4:  read  Buffer. 

is  attached  to  the  input  port  of  the  register  file,  a  read 
buffer  is  attached  to  the  output  ports  of  the  register 
file.  Each  time  a  register  is  used  it  appears  on  the  read 
port  and  is  pushed  into  the  read  buffer.  If  a  register 
r*  is  defined  in  /,  and  it  is  an  on-path  hazard,  then  r* 
must  have  been  read  within  the  last  N  cycles.  In  this 
case,  the  read  buffer  will  contain  the  old  value  and  it 
is  permissible  to  write  the  new  value  into  the  regis¬ 
ter  file.  In  the  event  of  a  rollback  of  N  instructions, 


the  contents  of  the  read  buffer  are  popped  and  loaded 
back  into  the  register  file.  For  an  on-path  hazard,  the 
path  taken  after  *he  rollback  will  be  the  same  as  the 
path  taken  prior  to  rollback  and  each  read  of  r *  will 
produce  the  same  value  as  before.  Branch  hazards  will 
be  removed  by  the  compiler.  It  is  assumed  that  the 
read  buffer  is  an  integral  part  of  the  register  file  and 
any  error  in  the  system  does  not  corrupt  the  transfer 
to  the  read  buffer  or  its  contents. 

In  contrast  to  a  history  buffer  which  forces  a  read 
of  r/c  prior  to  writing  r*  ,  the  read  buffer  monitors  the 
register  file  ports  and  stores  only  the  values  read  as 
part  of  the  normal  program  flow  and  therefore  should 
not  significantly  impact  the  register  file  performance 
or  CPU  cycle  time.  The  read  buffer  is  twice  the  width 
of  a  register  with  a  depth  of  N.  This  is  twice  the  size  of 
a  delayed  write  buffer,  but  eliminates  the  requirement 
for  complex  bypassing  and  prioritization  logic. 

4.2  Combined  Approach 

On-path  hazards  were  3  to  4  times  more  frequent 
than  branch  hazards  across  the  five  benchmark  pro¬ 
grams  evaluated.  This  would  imply  an  improvement 
in  performance  when  resolving  only  branch  hazards 
using  compiler  techniques.  The  difficulty  arises  in  de¬ 
termining  which  branch  hazards  to  resolve.  In  addi¬ 
tion  to  resolving  all  on-path  hazards,  the  read  buffer 
will  resolve  some  branch  hazards. 

Figure  5  shows  an  on-path  hazard  and  a  branch 
hazard  both  with  definitions  of  x  in  /,  and  uses  of  x, 
after  rollback,  in  instructions  I)  and  respectfully. 
Note  that  if  path  l  is  initially  taken,  the  read  buffer 
will  contain  the  old  value  of  x  and  rollback  would 
be  successful.  However  if  path  m  is  taken,  the  read 
buffer  will  not  contain  the  old  value  of  x  and  rollback 
would  be  unsuccessful.  If  only  paths  such  as  l  exist, 
the  presence  of  the  on-path  hazard  assures  successful 
rollback.  In  this  case,  resolution  of  the  branch  hazard 
using  compiler  techniques  is  not  necessary. 

The  current  software  calculates  on-path  hazards 
and  total  hazards,  however  it  is  not  yet  capable  of 
accounting  for  read  buffer  resolution  of  branch  haz¬ 
ards.  Compiler  resolution  of  (total  hazards  minus  on- 
path  hazards)  would  be  overly  optimistic  and  result  in 
incorrect  performance  impact.  In  lieu  of  direct  mea¬ 
surements,  a  conservative  measure  of  the  range  of  im¬ 
provement  that  could  be  expected  with  the  read  buffer 
was  obtained  using  a  transformation  on  the  unmodi¬ 
fied  (i.e.,  original)  assembler  level  code. 

The  transformation  creates  on-path  hazards  when 
necessary  to  assure  that  all  branch  hazards  are  re¬ 
solved  by  the  read  buffer.  Given  one  such  branch  haz- 
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Figure  5:  read  Buffer  Resolution  of  Branch  Hazards 

ard  which  defines  physical  register  r *  at  instruction 
/,- ,  the  transformation  inserts  a  MOV  r*,nt  instruc¬ 
tion  immediately  before  I, .  This  guarantees  that  all 
paths  leading  to  U  are  like  path  /  in  Figure  5.  Section 
5  includes  experimental  results  using  this  transforma¬ 
tion. 

5  Performance  Evaluation 

5.1  Implementation 

The  transformation  algorithms  have  been  imple¬ 
mented  in  the  MIPS  code  generator  of  the  IMPACT  C 
compiler  [11].  Transformations  resolving  pseudo  regis¬ 
ter  hazards  (loop  protection,  node  splitting,  and  loop 
expansion)  are  called  just  before  register  allocation. 
Transformations  resolving  machine  register  hazards 
are  called  after  the  live  range  constraints  have  been 
generated  and  before  physical  register  allocation.  The 
nop  insertion  algorithm  is  called  before  the  assembly 
code  output  routine. 

5.2  Benchmarks 

Five  benchmark  programs  were  cross-compiled  on 
a  SPARCserver  490  and  run  on  a  DECstation  3100. 
QUEEN  is  based  on  the  eight-queen  program  but  with 
12  queens  as  input.  QSORT  implements  the  quick 
sort  algorithm  to  process  a  randomly  generated  array. 
Both  QUEEN  and  QSORT  use  recursive  calls.  WC 


and  CMP  are  well-known  UNIX  utilities  and  PUZZLE 
is  a  simple  game. 

The  results  are  summarized  in  Figures  6  through 
15.  Two  groups  of  results  are  shown  for  each  bench¬ 
mark.  The  first  group  shows  performance  measured 
by  run  time  overhead  (OH),  in  seconds,  on  the  DEC¬ 
station  3100  and  the  second  group  by  code  size  over¬ 
head,  in  number  of  assembly  instructions  emitted  by 
the  code  generator,  not  including  the  library  routines 
and  other  fixed  overhead,  s/w:  op  represents  perfor¬ 
mance  impact  using  software  transformations  to  re¬ 
move  on- path  hazards  only  and  s/w:  op  br  shows 
performance  impacts  using  similar  transformations  to 
remove  both  on-path  and  branch  hazards. 

5.3  Performance  Analysis 

The  compiler  transformations  introduce  perfor¬ 
mance  impact  in  several  ways.  Loop  protection  in¬ 
serts  save/ restore  operations  at  the  head  and  tail  of 
the  loop.  This  increases  the  path  length  and  therefore 
increases  run  time.  Additional  arcs  in  the  dependency 
graph  can  cause  more  spill  code  to  be  generated,  in¬ 
creasing  memory  references  and  cache  misses.  Nop 
insertion  can  be  costly  since  up  to  N  nops  could  be 
inserted  for  each  unresolved  hazard.  Finally,  the  in¬ 
crease  in  code  size  (mainly  due  to  loop  expansion)  may 
cause  more  runtime  cache  misses. 

The  loop  expansion  transformation,  however  can 
improve  performance  over  a  compiler  that  does  not 
have  this  optimization  technique  as  demonstrated  by 
the  run-time  results  for  CMP  and  PUZZLE  [12].  Once 
the  loop  is  expanded,  some  condition  checks  and  index 
operations  can  be  eliminated.  Also  the  save/restore 
operations  from  loop  protection  shorten  the  live  ranges 
of  some  registers  thus  allowing  more  efficient  regis¬ 
ter  allocation.  Only  the  latter  optimization  is  imple¬ 
mented  in  the  software  described  in  this  paper. 

5.3.1  Compiler-Only 

It  is  interesting  to  note  that  there  is  negligible  incre¬ 
mental  performance  impact  introduced  by  resolving 
branch  hazards  in  addition  to  on-path  for  the  bench¬ 
marks  evaluated.  Two  key  factors  account  for  this 
result.  First,  on-path  hazards  dominate  in  frequency 
of  occurrence  and  second,  resolving  an  on-path  haz¬ 
ard  at  instruction  /,•  through  renaming  can  sometimes 
resolve  a  branch  hazard  at  instruction  /,.  Addition¬ 
ally,  resolving  a  similar  on-path  hazard  with  nop  in¬ 
sertion  may  resolve  a  corresponding  branch  hazards 
by  increasing  the  distance  between  it  and  its  nearest 
predecessor  branch. 


5.3.2  Compiler/Hardware 

Figures  6  through  15  also  show  the  run  time  and  code 
size  overheads  for  each  benchmark  assuming  the  read 
buffer  to  resolve  on-path  hazards  and  the  assembler 
level  transformation  described  in  Section  4.  These 
measurements  are  labeled  h/w:  op,  s/w:  br. 
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Figure  6:  QUEEN,  Runtime  Overhead. 
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Figure  7:  QUEEN,  Program  Size  Overhead. 

The  results  are  worst  case  in  the  sense  that  many 
of  the  branch  hazards  could  have  been  resolved  with 
no  performance  impact  using  the  compiler  techniques 
of  Section  3.  Instead,  they  are  resolved  by  insertion 
of  MOV  instructions  which  causes  a  guaranteed  (al¬ 
though  small)  performance  impact. 

All  benchmarks  except  one  have  less  than  4%  per¬ 
formance  impact  and  all  benchmarks  have  less  than 
14%  code  size  increase.  Given  the  read  buffer  feature 
and  the  option  to  use  compiler  techniques  as  well,  all 
benchmarks  are  below  4%  performance  impact  with 
an  average  impact  of  1%.  The  run  time  results  of 
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Figure  8:  VVC,  Runtime  Overhead. 
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Figure  9:  WC,  Program  Size  Overhead. 

PUZZLE  indicate  that  compiler  techniques  are  still 
useful  in  reducing  performance  penalties.  These  com¬ 
piler  techniques  however,  have  the  disadvantages  of 
requiring  recompilation,  long  compilation  times,  and 
significant  code  growth. 

Current  work  is  underway  to  modify  the  compiler 
transformations  to  allow  branch  hazard  resolution 
only.  All  indications  are  that  the  performance  impact, 
code  growth  and  compilation  time  will  be  reduced 
below  the  current  levels.  Our  experiments  indicate 
that  a  combined  compiler/hardware  scheme  for  haz¬ 
ard  resolution  can  produce  lower  performance  penal¬ 
ties  than  either  a  compiier-only  scheme  or  a  hardware- 
only  scheme.  It  is  also  noted  that  the  code  size  is 
reduced  relative  to  a  compiler-only  scheme  and  that 
hardware  complexity  is  reduced  relative  to  a  delayed 
write  scheme. 
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Figure  11:  QSORT,  Program  Size  Overhead.  Figure  13:  CMP,  Program  Size  Overhead. 


6  Concluding  Remarks 

Two  schemes  have  been  described  to  efficiently  sup¬ 
port  multiple  instruction  rollback  with  branch  recov¬ 
ery,  a  compiler-only  scheme  and  a  combined  com¬ 
piler/hardware  scheme.  Hazard  classification  has 
proved  useful  in  construction  of  the  combined  scheme. 
Compiler  transformations  such  as  pseudo  register  re¬ 
naming,  node  splitting,  loop  protection,  and  loop  ex¬ 
pansion  were  shown  to  be  effective  in  resolving  on- 
path  and  branch  hazards  with  negligible  performance 
impacts  over  resolving  on-path  hazards  alone.  The 
compiler-only  approach  yields  performance  impacts 
consistent  with  previous  compiler  techniques  [1]  and 
hardware  techniques  [6],  A  hardware  assisted  scheme 
was  introduced  to  resolve  on-path  hazards  by  main¬ 
taining  a  window  of  instruction  read  history. 

Our  hardware  assisted  scheme  introduces  little  per¬ 
formance  impact  and  reasonable  additional  circuitry. 


Compiler  techniques  are  used  to  resolve  the  remain¬ 
ing  branch  hazards  with  a  modest  increase  in  overall 
compile  time.  The  performance  measurements  indi¬ 
cate  that  the  compiler/hardware  scheme  can  achieve 
lower  performance  impact  than  either  a  compiler  only 
scheme  or  a  delayed  write  hardware  scheme.  It  should 
be  noted  that  our  scheme  applies  only  to  the  CPU  and 
requires  additional  hardware  to  maintain  the  states  of 
the  program  counter,  program  status  word.  etc..  The 
read  buffer  is  twice  the  size  of  a  delayed  write  buffer 
but  avoids  the  requirement  for  bypassing  and  prioriti¬ 
zation  logic. 
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Figure  14:  PUZZLE,  Runtime  Overhead. 
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Figure  15:  PUZZLE,  Program  Size  Overhead. 
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